Conversation
This commit executes a major architectural refactor, decomposing the monolithic `client.py` and `usage_manager.py` files into modular, domain-specific packages to improve maintainability and separation of concerns. - **Client Refactor**: `RotatingClient` is now a lightweight facade delegating to: - `RequestExecutor`: Unified retry and rotation logic. - `StreamingHandler`: Stream processing and error detection. - `CredentialFilter`: Tier and priority compatibility filtering. - `ModelResolver`: Model name resolution and whitelisting. - `ProviderTransforms`: Provider-specific request mutations. - **Usage Manager Refactor**: `UsageManager` logic is now distributed across: - `TrackingEngine`: Usage recording and window management. - `LimitEngine`: Enforcement of cooldowns, caps, and fair cycle limits. - `SelectionEngine`: Credential selection strategies (balanced/sequential). - `CredentialRegistry`: Stable identity management for credentials. - `UsageStorage`: Resilient async JSON persistence. - **Core Infrastructure**: Added `src/rotator_library/core` for shared types, error definitions, and centralized configuration loading. BREAKING CHANGE: The files `src/rotator_library/client.py` and `src/rotator_library/usage_manager.py` have been deleted. `client.py` is replaced by the `client` package. `usage_manager.py` is replaced by the `usage` package. Direct imports from `rotator_library.usage_manager` must be updated to `rotator_library.usage` or top-level exports.
…mplementation - Preserve the original monolithic implementation in `_client_legacy.py` and `_usage_manager_legacy.py`. - Update `RequestExecutor` to support transaction logging, request sanitization, and consecutive quota failure detection. - Implement `ConcurrentLimitChecker` in the usage limit engine to enforce `max_concurrent` constraints. - Improve `StreamingHandler` with robust error buffering for fragmented JSON responses. - Add fair cycle reset logic and quota baseline synchronization to the new `UsageManager`.
This overhaul introduces smart queuing for credential acquisition and shared quota tracking. - Implement async waiting in `UsageManager`: calls to `acquire_credential` now block (up to a deadline) using asyncio primitives when keys are busy or on cooldown. - Add Quota Group synchronization: request counts are now synced across models that share a specific quota pool (e.g., Antigravity variants). - Add support for cached prompt token tracking in usage statistics. - Refactor `RequestExecutor` to reuse a shared `httpx.AsyncClient` for better performance. - Correct token counting for Antigravity models by including preprompt overhead. BREAKING CHANGE: `UsageManager.acquire_credential` is now `async` and must be awaited. `RequestExecutor` now requires an `http_client` argument during initialization.
- **error_handler**: Implement logic to extract `quotaValue` and `quotaId` from Google/Gemini error responses for better rate limit observability. - **streaming**: Remove legacy `UsageManager` support and the `__init__` method from `StreamingHandler`; usage recording is now delegated to `CredentialContext`. - **client**: Handle `_parent_log_dir` internal parameter to configure transaction logger output directory.
This commit introduces a comprehensive overhaul of the usage tracking system to support more complex quota management scenarios and integration capabilities. - **Granular Usage Scopes**: Usage windows can now be scoped to specific models, quota groups, or credentials via `WindowDefinition.applies_to`, enabling precise limit enforcement (e.g., shared model quotas vs. individual key limits). - **Cost Calculation**: Integrated `litellm` cost calculation for both standard and streaming requests. `approx_cost` is now tracked and persisted in usage statistics. - **Provider Hooks**: Added `HookDispatcher` and `on_request_complete` interface, allowing provider plugins to intercept request results, override usage counts, or trigger exhaustion based on custom logic. - **Usage API**: Introduced `UsageAPI` facade (`src/rotator_library/usage/integration/api.py`) to provide a stable interface for external components to query state and manage cooldowns. - **Fair Cycle Enhancements**: Refined fair cycle tracking with support for global state persistence and configurable tracking modes (credential-level vs. group-level). - **Configuration**: Expanded environment variable support for custom caps, fair cycle settings, and sequential fallback multipliers. - **Persistence**: Updated storage schema to handle nested `model_usage` and `group_usage` statistics. Also in this commit: - feat(client): add automatic request validation via provider plugins - fix(streaming): correctly track cached vs. uncached tokens in usage stats - refactor: add backward compatibility shim in `src/rotator_library/usage_manager.py
Refines the request executor and client to support deep observability and dynamic quota management within the modular architecture. - **Observability**: - Implement a sanitized LiteLLM logging callback to safely pipe provider logs to the library logger. - Capture response headers in `UsageManager`, `CredentialContext`, and failure logs to aid debugging. - Pass request headers to the failure logger for better context. - **Usage Management**: - Implement `_apply_usage_reset_config` to dynamically generate rolling window definitions (e.g., daily limits) based on provider settings. - Fix `fair_cycle_key` resolution logic in the tracking engine. - **Client & Executor**: - Support passing provider-specific LiteLLM parameters via `RequestExecutor`. - Update `BackgroundRefresher` to support the new usage manager registry and prevent crashes when managers are missing.
…ization This commit overhauls how usage statistics are aggregated and initialized to support shared quotas and improve stability. - **Quota Groups**: Implemented `_backfill_group_usage` to derive shared group statistics (e.g., tiered limits) from individual model windows. Updated `CooldownChecker` to enforce limits at both model and group levels. - **Initialization**: Added `initialize_usage_managers` with async locking to `RotatingClient`. Updated `main.py` and `background_refresher.py` to invoke this explicitly, ensuring state is loaded before traffic. - **Persistence**: Switched storage mechanisms to `ResilientStateWriter` and `safe_read_json` to prevent data corruption during atomic writes. - **Providers**: Refined quota calculations (calculating `quota_used` from fractions) for Chutes, Firmware, and NanoGPT. - **Antigravity**: Updated system prompts to strictly enforce parallel tool calling behavior. - **Logging**: Implemented batched logging for quota exhaustion events to reduce noise.
…t execution This commit introduces comprehensive usage tracking and refactors the client execution flow for better observability and stability. - Refactor `RotatingClient.acompletion` to be explicitly `async`, ensuring proper execution in `proxy_app`. - Implement detailed usage summaries in `get_usage_stats`, including token caching percentages, approximate costs, and detailed provider states. - Add granular logging in `RequestExecutor` to trace credential availability (displaying blocks by cooldowns, fair cycle, or caps) and current quota window saturation. - Introduce debounced state saving in `UsageManager` to optimize storage I/O and add logic for backfilling model usage data. BREAKING CHANGE: `RotatingClient.acompletion` is now an `async` function and must be awaited by the caller.
…window manager - Update `RequestExecutor` to await `usage_manager.get_availability_stats`, ensuring non-blocking execution during availability checks. - Expose `window_manager` directly as a property on `UsageManager`. - Refactor `RequestExecutor` to access `window_manager` directly instead of traversing via `limits.windows`.
Expanded the usage tracking system to capture and persist detailed token metrics, specifically reasoning (thinking) tokens and cache creation tokens. - Updated client executors to extract `reasoning_tokens` and `cache_creation_tokens` from provider responses. - Extended `UsageStats` and `WindowStats` models to store granular token breakdowns and explicit success/failure counts. - Adapted storage and aggregation logic to persist and calculate these new metrics across quota windows.
…tions - Archive the existing `RotatingClient` and `UsageManager` logic into new `_legacy` modules (`_client_legacy.py` and `_usage_manager_legacy.py`). This preserves the stable baseline implementation to facilitate reference and comparison during the ongoing core architecture refactor. - Add `src/rotator_library/providers/example_provider.py` as a comprehensive reference template. This file documents the standard patterns for implementing providers with advanced usage management, quota groups, custom token counting, and background refresh jobs.
…ntation This introduces a thread-safe mechanism using `contextvars` to accurately track and report internal API retries within providers. - Implement `ContextVar` retry counting in `AntigravityProvider` to capture hidden API costs from internal retries (e.g., on empty responses or malformed calls). - Update `ExampleProvider` with comprehensive patterns for custom usage handling, including retry counting and error-based cooldowns. - Expand documentation for `UsageAPI` and `HookDispatcher` with detailed usage examples and architectural context.
Updates the usage manager to prevent lower (stale) API usage values from overwriting higher (current) local counters during background synchronizations. API providers often return cached data or update in increments, causing local state regression. - Implement `max(local, api)` logic for request counts to ensure monotonic growth - Add `force` parameter to `UsageManager` and quota trackers to allow manual overrides - Preserve accurate local tracking while allowing forced resets
This introduces a compatibility layer that allows the `RotatingClient` to accept Anthropic-format requests, translating them to the internal OpenAI format for processing and converting responses back. - Implement `AnthropicHandler` with support for `messages` and `count_tokens`. - Integrate handler into `RotatingClient` to enable direct Anthropic SDK usage. Also in this commit: - feat(usage): add `force_refresh_quota` and `reload_usage_from_disk` for manual state management - refactor(usage): implement `reload_from_disk` to sync local state without external calls
|
I'm starting a comprehensive review of this major core refactor. With 62 files changed and a significant shift towards modularity in Client.py and usage_manager.py, I'll be taking a deep dive into the new architecture. Expect my full feedback shortly. |
There was a problem hiding this comment.
Overall Assessment
This is a phenomenal refactor of the library core. Moving from the monolithic Client.py and usage_manager.py to a modular, engine-based architecture is a massive leap forward for the project's maintainability and extensibility. The new design is clean, robustly handles complex logic (like fair cycles and shared quota pools), and provides a much more intuitive API through the UsageManager and RotatingClient facades.
The code quality is exceptionally high, with great attention to detail in concurrency control, persistence resilience, and schema migrations.
Architectural Feedback
- Modularity: The separation of concerns into
SelectionEngine,LimitEngine,TrackingEngine, andWindowManageris excellent. It makes the logic much easier to reason about and test. - Credential Registry: The introduction of stable IDs based on OAuth metadata or API key hashes is a clever solution to the problem of tracking usage across path changes.
- Context Manager Pattern: Using
CredentialContextfor credential acquisition and release ensures that resources are handled correctly even in the face of exceptions.
Key Suggestions
- Shared Setup in Executor:
_execute_non_streamingand_execute_streamingshare a lot of initialization and filtering logic. Consolidating this into a helper would reduce duplication. - Logging Noise: Consider lowering the log level for credential releases in
UsageManagerto avoid flooding logs in high-traffic scenarios. - Engine Redundancy: The empty
releasemethod inTrackingEnginecould be removed if it's not intended to hold logic, avoiding confusion about where concurrency state is managed.
Nitpicks and Minor Points
- I noticed the
StreamingHandleris now very cleanly integrated with the new context manager pattern. - The migration logic in
UsageStoragefor v1 -> v2 schema looks solid and handles the legacy path-based keys gracefully.
Questions for the Author
None. The documentation and implementation are very clear.
This review was generated by an AI assistant.
src/rotator_library/usage/manager.py
Outdated
| lib_logger.info( | ||
| f"Released credential {self._mask_accessor(state.accessor)} " | ||
| f"from {model} (remaining concurrent: {remaining}" | ||
| f"{f'/{state.max_concurrent}' if state.max_concurrent else ''})" | ||
| ) | ||
| else: | ||
| state.active_requests = max(0, state.active_requests - 1) | ||
| lib_logger.info( | ||
| f"Released credential {self._mask_accessor(state.accessor)} " | ||
| f"from {model} (remaining concurrent: {state.active_requests}" | ||
| f"{f'/{state.max_concurrent}' if state.max_concurrent else ''})" | ||
| ) |
There was a problem hiding this comment.
In high-concurrency environments, logging every credential release might generate excessive noise in the logs. Consider changing this to DEBUG level or adding a conditional check if detailed concurrency tracking is needed.
…counts - Extract `_aggregate_model_windows` to unify how usage stats are summed across quota groups and credentials, reducing code duplication. - Implement `_reconcile_window_counts` to ensure success and failure counters remain consistent (mathematically valid) when total request counts are updated from external quota sources. - Enable synchronization and backfilling of credential-level windows to reflect aggregated model usage. - Simplify `update_usage` and `_backfill_group_usage` logic by leveraging the new shared aggregation helpers.
This commit extracts the initialization, credential filtering, and validation steps into a reusable `_prepare_execution` helper method within `RequestExecutor`. This ensures consistency and reduces code duplication between streaming and non-streaming request handlers. Also in this commit: - refactor(usage): remove unused `release` method from `TrackingEngine
There was a problem hiding this comment.
Follow-up Review
This update makes great strides in refining the core refactor. The architectural improvements in request execution and the more robust usage aggregation logic are excellent additions.
Assessment of New Changes:
- Shared Setup: The consolidation in
RequestExecutoris well-implemented and addresses the previous feedback perfectly. - Engine Cleanup: The removal of the redundant
releasemethod inTrackingEnginesimplifies the tracking flow. - Improved Aggregation: The new
_aggregate_model_windowsand_reconcile_window_countslogic inUsageManagerprovides a much more resilient way to handle shared quota groups and out-of-sync external metrics. It's a significant improvement over the previous model-syncing approach.
Overall Status:
The PR is in excellent shape. The only remaining item is the logging level adjustment for credential releases, which I've flagged again to ensure it doesn't get missed. Once that's addressed, I'm fully confident in this refactor.
This review was generated by an AI assistant.
src/rotator_library/usage/manager.py
Outdated
| async with lock: | ||
| state.active_requests = max(0, state.active_requests - 1) | ||
| remaining = state.active_requests | ||
| lib_logger.info( |
There was a problem hiding this comment.
As suggested in the previous review, this log level should be changed to DEBUG. In high-concurrency scenarios, logging every credential release at INFO level creates significant noise.
Refactors the usage tracking system to remove the monolithic `UsageStats` structure in favor of distinct `ModelStats`, `GroupStats`, and `TotalStats` containers. This eliminates complex window aggregation and backfilling logic by recording usage directly to relevant scopes via a unified `UsageUpdate` mechanism. - Replace `UsageStats` with `TotalStats`, `ModelStats`, and `GroupStats` in `CredentialState` - Remove `_backfill_group_usage`, `_sync_quota_group_counts`, and window aggregation logic - Centralize request recording in `TrackingEngine` using a new `UsageUpdate` dataclass - Update storage persistence to serialize/deserialize the new schema - Adapt limit checkers (`WindowLimitChecker`, `CustomCapChecker`) to access specific stats scopes - Update client executor and API integration to reflect the new data model
- Add `window_limits_enabled` configuration (default `False`) to allow disabling local window quota blocking. - Update `LimitEngine` to only include `WindowLimitChecker` when explicitly enabled in config. - Restore legacy-style logging for quota exhaustion events, including reset time calculations. - Re-implement fair cycle exhaustion logging to track credential usage status. - Modify window exhaustion logic to only apply cooldowns when forced.
- Update `UsageManager` logic to apply quota updates to model windows only when no `group_key` is provided. - Ensure usage is attributed to the correct scope (group vs. model), preventing model window updates when API-level group quotas are active where specific model attribution may be unknown or irrelevant.
…imestamps - Standardize default configuration by replacing the specific `daily` window with a generic 24h `rolling` window. - Remove the deprecated `total` window definition and add logic to ignore legacy "total" windows during storage parsing. - Add `*_human` date string fields (e.g., `reset_at_human`, `last_updated_human`) alongside Unix timestamps in storage dumps to improve debugging and observability.
The quota display logic in `RequestExecutor` now implements a fallback strategy. It prioritizes shared group limits but falls back to model-specific limits if no group window is found or if the group window lacks a limit. Changes to defaults: - Remove `WindowDefinition.total` and the default infinite tracking window. - Simplify default window configuration to a single primary daily window (removed 5h window). Also in this commit: - chore: update `.gitignore` to anchor paths and exclude `oauth_creds/` and `usage/` directories
The tracking engine now checks `self._config.window_limits_enabled` before evaluating window exhaustion for groups and models. This ensures that usage tracking does not mark resources as exhausted based on window limits when the feature is explicitly disabled.
Inject a shared `provider_instances` cache into core client sub-components to prevent duplicate plugin instantiation. - Update `CredentialFilter`, `ModelResolver`, `ProviderTransforms`, and `RequestExecutor` to accept an optional `provider_instances` dict. - Configure `RotatingClient` to pass its central instance registry to these components during initialization. - Ensure plugin instances are reused across the client lifecycle, improving state consistency and reducing overhead.
This change introduces a `SingletonABCMeta` metaclass to ensure that all `ProviderInterface` implementations behave as singletons. This resolves state fragmentation issues where different components (UsageManager, Hooks, etc.) held separate instances with distinct caches. - Implement `SingletonABCMeta` and apply to `ProviderInterface`. - Remove redundant local instance caching in `HookDispatcher` as class instantiation now guarantees a singleton return. - Add debug logging in `CredentialFilter` to trace instance creation vs. cache hits.
This commit overhauls how quota exhaustion is determined by delegating the decision to individual providers rather than relying on a generic check in the usage manager. - Introduce `is_initial_fetch` flag to distinguish between startup synchronization and background refreshes. - Update `AntigravityQuotaTracker` to only apply API-based exhaustion during initial fetch (due to coarse 20% API update increments), relying on local tracking for subsequent updates. - Update `BaseQuotaTracker` to strictly validate `reset_timestamp`, ignoring it if the quota is full (1.0 remaining), and applying exhaustion immediately if remaining is 0. - Refactor `UsageManager` to accept an explicit `apply_exhaustion` boolean from providers. - Add idempotency check to `TrackingEngine` to prevent duplicate logging of exhaustion events.
…ion parsing - Introduce `quota_threshold` in Fair Cycle configuration to control exhaustion based on window limit multipliers. - Implement robust duration string parsing (e.g., "1d2h30m") and flexible cooldown modes (offsets, percentages) for Custom Caps. - Update `FairCycleChecker` to utilize `WindowManager` for dynamic quota resolution. - Allow Custom Caps to define limits independent of window clamps and improve cooldown calculation logic. - Remove redundant explicit exhaustion marking in `TrackingEngine` in favor of centralized checker logic.
The custom cap logic has been overhauled to independently verify both model-level and group-level limits. Previously, the system resolved a single "most specific" cap, which could allow requests to bypass group limits if a model-specific cap was present (or vice versa). - Implemented `_find_all_caps` to retrieve both priority/default model caps and group caps simultaneously. - Updated `check` method to validate every applicable cap against its respective usage scope (model stats vs. group stats). - Extracted validation logic into `_check_single_cap` for better readability and reuse. - Added `get_all_caps_for` to expose all active limits for a given context.
There was a problem hiding this comment.
Follow-up Review
This update continues the excellent work on the core refactor. The modularization of the tracking logic into record_usage and the transition to specific ModelStats/GroupStats structures significantly improves the clarity and maintainability of the usage system.
Assessment of New Changes:
- Consolidated Tracking: The refactor of the
TrackingEngineandUsageManagerto userecord_usageis a great improvement. It eliminates a lot of redundant aggregation logic and makes the data flow much clearer. - Provider Singletons: Implementing
SingletonABCMetafor providers is a smart move to ensure consistency across different components. - Improved Cooldowns: The enhanced custom cap cooldown logic, especially the support for offsets from natural reset, provides much better control over rotation behavior.
Key Issues to Address:
- Redundant Logic: There is a duplicate code block in
_execute_non_streaminginexecutor.pythat should be removed. - Duplication Cleanup: A few minor duplicates (constant definition,
__all__entry) and missing docstring parameters were noted. - Logging Level: The log level for credential releases in
UsageManagerstill needs to be adjusted toDEBUGas requested previously.
Overall Status:
The architecture is now very robust. Once these minor cleanups are addressed, this refactor is in great shape for merge.
This review was generated by an AI assistant.
src/rotator_library/usage/manager.py
Outdated
| async with lock: | ||
| state.active_requests = max(0, state.active_requests - 1) | ||
| remaining = state.active_requests | ||
| lib_logger.info( |
There was a problem hiding this comment.
As raised in previous reviews, this log level should be changed to DEBUG to avoid log spam in high-traffic environments.
This introduces a "high-water mark" tracking mechanism for usage windows. Even after a window resets (e.g., a daily limit reset), the system now retains the maximum number of requests ever recorded for that specific window definition. - Add `max_recorded_requests` and `max_recorded_at` fields to `WindowStats`. - Update `TrackingEngine` to capture and update peak usage in real-time. - Ensure `WindowManager` carries forward historical max values when rotating or recreating expired windows. - Update storage persistence and manager output to include these new metrics.
There was a problem hiding this comment.
Follow-up Review
The addition of historical high-water mark tracking (max_recorded_requests) is a great feature. It integrates well with the persistence layer and properly carries forward across window resets.
Assessment of New Changes:
- Historical Tracking: The implementation in
WindowManagerto preserve the max from expired windows is solid. - Persistence: The storage schema updates correctly handle the new fields.
Status of Previous Feedback:
- Unaddressed Duplications: There are still several blatant duplications in
executor.py,usage/__init__.py, anddefaults.pythat were flagged in the previous turn. - Logging Level: The
UsageManagerrelease log level remains atINFO. Please shift this toDEBUGto prevent log saturation in high-concurrency environments.
Overall Status:
The new functionality is excellent, but please take a moment to address the remaining cleanup items and duplicate logic. These small refinements will significantly improve the production readiness of this core refactor.
This review was generated by an AI assistant.
src/rotator_library/usage/manager.py
Outdated
| async with lock: | ||
| state.active_requests = max(0, state.active_requests - 1) | ||
| remaining = state.active_requests | ||
| lib_logger.info( |
There was a problem hiding this comment.
As requested in several previous reviews, this log level should be changed to DEBUG to avoid log spam in production environments.
src/rotator_library/usage/manager.py
Outdated
| ) | ||
| else: | ||
| state.active_requests = max(0, state.active_requests - 1) | ||
| lib_logger.info( |
There was a problem hiding this comment.
This log level should also be changed to DEBUG.
…and sorting logic - Implement support for displaying multiple quota windows (e.g., daily, monthly) per group in dashboard and detail views. - Refactor `UsageManager` to aggregate quota stats by window structure instead of flattening totals. - Fix token usage calculations to correctly distinguish between cached and uncached input tokens. - Add intelligent sorting strategies: providers ordered by attention needed (quota status), and quota groups by lowest remaining percentage. - Add configuration toggle to show/hide model-level usage details per provider. - Improve UI layout with adjustable column widths, better cooldown grouping, and natural sort order for credentials.
- Defer initialization of window timers (`started_at`, `reset_at`) until the first request is actually recorded. - Prevent the display of misleading reset times for windows that have not yet been utilized. - Update quota synchronization logic to only apply remote reset times and timestamps when usage is detected.
- Change sorting criteria in `QuotaViewer` and `UsageManager` to order by total quota limit (lowest first) rather than remaining percentage. - Implement alphabetical tie-breaking for groups with equal limits. - Ensure groups without limits are sorted last. - Provide a more stable view order that reflects capacity constraints rather than transient usage.
This update enhances the usage statistics tracking and visualization to support advanced proxy features. - **Usage Logic**: Update `UsageManager` to calculate and expose Fair Cycle exhaustion, group-specific cooldowns, and Custom Cap limits. - **Status Calculation**: Refine credential status logic to identify a "mixed" state when only specific quota groups are exhausted. - **UI Indicators**: Add visual markers for Fair Cycle (`:scales:`) and Custom Caps (`:bar_chart:`) in the quota summary and detail views. - **Rendering**: Standardize emoji usage to Rich text markup (e.g., `:warning:` instead of unicode) across `LauncherTUI` and `QuotaViewer` to ensure consistent alignment and width calculations.
Standardize emoji usage by converting raw unicode characters to `rich` console markup aliases throughout the application. This ensures consistent rendering across different terminal environments and prevents potential encoding issues. Also in this commit: - style(ui): adjust column widths in quota viewer for better text alignment
- Remove deprecated `_client_legacy.py` and `_usage_manager_legacy.py` files. - Refactor `RequestExecutor` to use a `_get_quota_display` helper method, reducing code duplication in logging logic. - Remove duplicate constant definition in `defaults.py`. - Fix duplicate export in `usage` module.
There was a problem hiding this comment.
Follow-up Review
This update continues the refinement of the core modular architecture. The implementation of historical high-water mark tracking (max_recorded_requests) is a great addition, and the emoji updates in the TUIs improve consistency.
Assessment of New Changes:
- Historical Tracking: The high-water mark logic in
TrackingEngineis solid and well-integrated. - Documentation: The
RequestExecutor.__init__docstrings now correctly describe the missing parameters. - TUI Improvements: The shift to Rich emoji markup in
launcher_tui.pyandquota_viewer.pylooks good.
Status of Previous Feedback:
- Logging Level: The log level for credential releases in
UsageManageris still atINFO. This has been flagged in every review turn so far and really should be shifted toDEBUGto prevent log saturation in high-traffic production environments. - Code Duplication: While
_prepare_executionunified the initialization, there is still significant identical code shared between_execute_non_streamingand_execute_streamingregarding availability logging, credential acquisition, and request preparation.
Key Suggestions:
- Unify Request Execution: Several large blocks in
executor.pyare still duplicated between the streaming and non-streaming paths. Extracting these into helpers (e.g.,_log_availability,_prepare_request_kwargs) would improve maintainability. - Internal Cleanup:
UsageManagerhas some internal duplications (logging branches and the_mask_accessorhelper) that could be simplified.
This review was generated by an AI assistant.
src/rotator_library/usage/manager.py
Outdated
| async with lock: | ||
| state.active_requests = max(0, state.active_requests - 1) | ||
| remaining = state.active_requests | ||
| lib_logger.info( |
There was a problem hiding this comment.
As requested in multiple previous reviews, this log level should be changed to DEBUG. In high-concurrency environments, logging every credential release at INFO level creates significant noise.
There was a problem hiding this comment.
This is intended to be at info level. Not an issue.
| availability = await usage_manager.get_availability_stats( | ||
| model, quota_group | ||
| ) | ||
| blocked = availability.get("blocked_by", {}) | ||
| blocked_parts = [] | ||
| if blocked.get("cooldowns"): | ||
| blocked_parts.append(f"cd:{blocked['cooldowns']}") | ||
| if blocked.get("fair_cycle"): | ||
| blocked_parts.append(f"fc:{blocked['fair_cycle']}") | ||
| if blocked.get("custom_caps"): | ||
| blocked_parts.append(f"cap:{blocked['custom_caps']}") | ||
| if blocked.get("window_limits"): | ||
| blocked_parts.append(f"wl:{blocked['window_limits']}") | ||
| if blocked.get("concurrent"): | ||
| blocked_parts.append(f"con:{blocked['concurrent']}") | ||
| blocked_str = f"({', '.join(blocked_parts)})" if blocked_parts else "" | ||
| lib_logger.info( | ||
| f"Acquiring key for model {model}. Tried keys: {len(retry_state.tried_credentials)}/" | ||
| f"{availability.get('available', 0)}({availability.get('total', 0)}{blocked_str})" | ||
| ) |
There was a problem hiding this comment.
This availability check and logging logic is identical to the one in _execute_streaming (lines 525-544). Consider extracting this into a shared private method like _log_availability.
| availability = await usage_manager.get_availability_stats( | ||
| model, quota_group | ||
| ) | ||
| async with await usage_manager.acquire_credential( | ||
| model=model, | ||
| quota_group=quota_group, | ||
| candidates=untried, | ||
| priorities=filter_result.priorities, | ||
| deadline=deadline, | ||
| ) as cred_context: | ||
| cred = cred_context.credential | ||
| retry_state.record_attempt(cred) | ||
|
|
||
| state = getattr(usage_manager, "states", {}).get( | ||
| cred_context.stable_id | ||
| ) | ||
| tier = state.tier if state else None | ||
| priority = state.priority if state else None | ||
| selection_mode = availability.get("rotation_mode") | ||
| quota_display = self._get_quota_display( | ||
| state, model, quota_group, usage_manager | ||
| ) | ||
| lib_logger.info( | ||
| f"Acquired key {mask_credential(cred)} for model {model} " | ||
| f"(tier: {tier}, priority: {priority}, selection: {selection_mode}, quota: {quota_display})" | ||
| ) |
There was a problem hiding this comment.
This credential acquisition block is duplicated in _execute_streaming (lines 565-590). Since it involves an async with context, you might want to extract the preparation logic or use a helper that returns the context.
Updates deployment configuration and documentation to align with the refactored usage tracking system. - Replace `key_usage.json` volume mount with `usage/` directory in docker-compose files. - Update system documentation to reflect new component paths and per-provider usage files. - Document new CLI options for raw logging and TUI quota viewer. BREAKING CHANGE: The persistence volume for usage statistics has changed from a file (`key_usage.json`) to a directory (`usage/`). Users must create this directory and update their Docker volume mounts to prevent startup errors and ensure usage data is persisted correctly.
…masking Refactors the library to improve code reusability and maintainability: - Extract `_log_acquiring_credential` and `_prepare_request_kwargs` methods in `RequestExecutor` to simplify the main `execute` loop. - Update `mask_credential` in `error_handler` to support a "full" masking style (first 4 + last 4 characters). - Remove redundant `_mask_accessor` method from `UsageManager` and replace usage with the centralized `mask_credential` utility.
There was a problem hiding this comment.
Follow-up Review
This update makes excellent progress on consolidating the request execution logic and standardizing credential masking. The refactoring of executor.py significantly improves the maintainability of the core library.
Assessment of New Changes:
- Shared Setup: The introduction of
_log_acquiring_credential,_prepare_request_kwargs, and_prepare_executioninexecutor.pyis a great step toward unifying the streaming and non-streaming paths. - Credential Masking: The migration to a centralized
mask_credentialutility inerror_handler.pyand its consistent use inUsageManageris a solid improvement. - Documentation: The updates to
DOCUMENTATION.mdandREADME.mdcorrectly reflect the newusage/directory structure and per-provider persistence.
Status of Previous Feedback:
- Logging Level: The log level for credential releases in
UsageManageris still atINFO. Please shift this toDEBUGas previously requested to prevent log saturation in production. - Structural Duplication: While many helpers were extracted, there are still several identical code blocks shared between the two execution paths in
executor.py(logging the acquired credential and pre-request callback handling).
Overall Status:
The architecture is now very clean and modular. Addressing the remaining logging level and the few remaining duplications in the executor will bring this refactor to a production-ready state.
This review was generated by an AI assistant.
Refactors the `RequestExecutor` class to improve code maintainability and reduce duplication between the initial request and retry loops. - Extract `_log_acquired_credential` to centralize detailed logging upon successful credential acquisition. - Extract `_run_pre_request_callback` to encapsulate callback execution and error handling logic. - Clean up the `execute` method by replacing verbose inline blocks with these helper methods.
A big refactor of Client.py and usage_manger.py into a modular and extensible system that was VERY needed, allowing much easier addition of new features and providers, windows, hooks.